Multi-omics differences in the bone marrow between essential thrombocythemia and prefibrotic primary myelofibrosis

Essential thrombocythemia (ET) and prefibrotic primary myelofibrosis (pre-PMF) are Philadelphia chromosome-negative myeloproliferative neoplasms. These conditions share overlapping clinical presentations; however, their prognoses differ significantly. Current morphological diagnostic methods lack reliability in subtype differentiation, underlining the need for improved diagnostics. The aim of this study was to investigate the multi-omics alterations in bone marrow biopsies of patients with ET and pre-PMF to improve our understanding of the nuanced diagnostic characteristics of both diseases. We performed proteomic analysis with 4D direct data-independent acquisition and microbiome analysis with 2bRAD-M sequencing technology to identify differential protein and microbe levels between untreated patients with ET and pre-PMF. Laboratory and multi-omics differences were observed between ET and pre-PMF, encompassing diverse pathways, such as lipid metabolism and immune response. The pre-PMF group showed an increased neutrophil-to-lymphocyte ratio and decreased high-density lipoprotein and cholesterol levels. Protein analysis revealed significantly higher CXCR2, CXCR4, and MX1 levels in pre-PMF, while APOC3, APOA4, FABP4, C5, and CFB levels were elevated in ET, with diagnostic accuracy indicated by AUC values ranging from 0.786 to 0.881. Microbiome assessment identified increased levels of Mycobacterium, Xanthobacter, and L1I39 in pre-PMF, whereas Sphingomonas, Brevibacillus, and Pseudomonas_E were significantly decreased, with AUCs for these genera ranging from 0.833 to 0.929. Our study provides preliminary insights into the proteomic and microbiome variations in the bone marrow of patients with ET and pre-PMF, identifying specific proteins and bacterial genera that warrant further investigation as potential diagnostic indicators. These observations contribute to our evolving understanding of the multi-omics variations and possible mechanisms underlying ET and pre-PMF. Supplementary Information The online version contains supplementary material available at 10.1007/s10238-024-01350-y.


Supplementary Method 1.1 Proteomic analysis 1.1.1 Sample processing
For sample processing, the following steps were undertaken.After deparaffinizing the sample, it was scraped into a centrifuge tube.BT lysis solution, ammonium bicarbonate (ABC) solution, and PMSF were added to cover the sample.Noncontact ultrasonic treatment was conducted at 4°C for 10 min.After ultrasonication, the sample was immediately centrifuged to settle the solution.The samples were heated at 90°C and centrifuged at 800 rpm for 15 minutes, then briefly every 5 minutes to prevent drying.They were cooled quickly to room temperature after heating.Iodoacetamide was added, followed by incubation in the dark at room temperature for 30 minutes.ABC solution was added to achieve a final concentration of BT less than 0.5%.Trypsin and LysC were added for combined enzymatic digestion and oscillated at 37°C and 800 rpm.After digestion, the samples were desalted and subjected to machine testing.

MS detection
For MS detection, the subsequent parameters were employed.The timsTOF Pro2 collected data using the parallel accumulation serial fragmentation (PASEF) DIA mode in the positive ion mode.The capillary voltage was set to 1400 V, the first-and second-stage mass spectrometry scan range being 100-1700 m/z.The ion mobility window range (1/K0) was 0.7-1.4Vs/cm 2 .The ion accumulation and release time were set to 100 ms, achieving an ion utilization rate close to 100%.A complete PASEF cycle consisted of 10 PASEF secondary stages, with a total cycle time of 1.1 s.In the method settings, a polygonal frame was used to effectively filter out low charge-tomass ratios and singly charged ions.When selecting the precursor ion, the target intensity of the precursor ion was 20,000, with an intensity threshold of 2,500 set for selection for secondary analysis.In a single 100-ms PASEF, 4 quadrupole isolation windows and 4 tims mobility windows were set, with the quadrupole isolation window being 28 Da.The collision energy varied linearly with ion mobility 1/K0, with fragmentation energies ranging from 20 to 59 eV corresponding to ion mobilities of 1/K0=0.6-1.6Vs/cm 2 .

Data analysis
For data analysis, the DIA raw data were processed using Spectronaut Pulsar software, the sequence search file being "UniProt-Homo sapiens-9606-2023.2.1.fasta".Analysis was conducted using the directDIA mode without building a library.

Microbiomics analysis 1.2.1 DNA extraction
For the extraction of sample DNA, the UPure FFPE Tissue DNA Kit (BioKeystone, Cat.No. M2015-A96) was used to extract genomic DNA from paraffin tissue sections.To assess environmental contamination, five paraffin areas without sample tissue on the edge of the sections were carefully selected using a sterile scalpel, and DNA extracted from them was used for quality control.Agarose gel electrophoresis (1%) was utilized to assess the integrity of the DNA.A Nanodrop 2000 (Thermo) was employed to validate the concentration and purity of the DNA.

Library construction and sequencing
After extracting DNA, we performed library construction and sequencing utilizing the 2bRAD-M technique, closely following Wang et al.'s [1] protocol with slight modifications.We digested DNA, ranging from 1 pg to 200 ng, using 4 U of the BcgI enzyme (NEB) for 3 hours at 37°C.After

Quality control of sequencing data
The raw image data files obtained from high-throughput sequencing were transformed into raw sequencing sequences through base calling analysis.
We refer to this as Raw Data.The results were stored in the FASTQ file format, which contained the sequence information of the sequencing reads and their corresponding quality information.Sequencing employed the PE150 strategy.Based on the recognition sites of the experimental enzymes, the reads were scanned and sequences containing the enzyme-cut fragments (called "Enzyme Reads") were extracted.They were then filtered according to the following conditions to generate Clean Reads: (a) Filter out reads where the proportion of unknown bases exceeds 8%.(b) Remove lowquality reads (those where the number of bases with a quality score below 30 exceeds 20% of the total bases in the read) (TableS1).

Identifying species-specific 2bRAD-M markers based on the most 1.2.5 comprehensive genome database
Initially, we acquired 404,199 microbial genomes (encompassing bacteria, fungi, and archaea) from the GTDB and Ensembl databases.Following this, we employed built-in Perl scripts to sample restriction fragments from these microbial genomes using each of the 16 types 2b restriction enzymes.This effort culminated in the creation of a large 2bRAD microbial genome database.2bRAD tags unique to a species-level taxon, with no overlap with other species, were established as species-specific 2bRAD markers.These markers together constituted the 2bRAD marker database.

Calculation of relative abundance
First, to identify microbial species within each sample, all sequenced 2bRAD tags after quality control were mapped (using a built-in Perl script) against the 2bRAD marker database, which contains all 2bRAD tags theoretically unique to each microbial species in the database.To mitigate the number of false positives in species identification, the G score was calculated for each species detected in a sample.This score represents the harmonic mean between the read coverage of 2bRAD markers belonging to a species and the number of all possible 2bRAD markers of that species.
The threshold of the G score for a false positive discovery of microbial species was set to 5 [2].
S: the number of reads assigned to all 2bRAD markers belonging to species i within a sample.t: number of all 2bRAD markers of species i that have been sequenced within a sample.Subsequently, we computed the mean read coverage for all 2bRAD markers of each species, indicating the number of individuals belonging to a species present in a sample at a given sequencing depth.The relative abundance of a given species was then calculated as the proportion of microbial individuals belonging to a species out of the total number of individuals from known species that could be detected within a sample.S: the number of reads assigned to all 2bRAD markers of species i within a sample.
T: the number of all theoretical 2bRAD markers of species i 2. Supplementary Figure   Table

Supplementary Table
digestion, adaptors were attached to the DNA fragments.The ligation process combined 5 µl of the digested DNA with 10 µl of a ligation master mix, comprising 0.2 µM of both adaptors and 800 U of T4 DNA ligase (NEB).The ligation was conducted at 4 ℃ for 12 hours.Subsequently, the ligation products were amplified and analyzed by 8% polyacrylamide gel electrophoresis.Bands of approximately 100bp were isolated from the polyacrylamide gel, and DNA was eluted from the gel over 12 hours at 4°C using nuclease-free water.During PCR, we utilized platform-specific barcode primers to introduce unique barcodes for each sample.Each 20 µl PCR mixture comprised 25 ng of gel-extracted PCR product, 0.2 µM of each primer, 0.3 mM dNTP, 1×Phusion HF buffer, and 0.4 U Phusion highfidelity DNA polymerase (NEB).PCR products were cleaned with the QIAquick PCR purification kit (Qiagen) and subsequently sequenced on the Illumina Novaseq PE150 platform.The 2bRAD-M procedure was executed at Qingdao OE Biotech Co., Ltd.(Qingdao, China).

Fig. S1
Fig. S1 Quality control.Number of peptides (A) and proteins (B) detected in every sample

Fig. S3
Fig. S3 Identification of ET and pre-PMF based on bone marrow microbiota.A-F ROC curves for the six discriminative genera.
S1. Sequencing information summary number of raw reads, enzyme reads, and clean reads and percentage of enzyme reads and clean reads